Gene finding by integrating gene finders
نویسندگان
چکیده
Gene finding, the accurate annotation of genomic DNA, has become one of the central topics in biological research. Although various computational methods (gene finders) have been proposed and developed, they all have their own limitations in gene findings. In this paper, we introduce an integrating gene finder, which combines the results of several existing gene finders together, to improve the accuracy of gene finding. Four integration schemes, based on majority voting, are developed for the analysis of two datasets – the basic dataset and the testing dataset. The basic dataset consists of 1500 DNA sequences and the testing dataset consists of 103 DNA sequences. It is demonstrated that a simple integration (a simple voting for each nucleotide) can significantly improve the finding performance, and removing confusing gene finders, caused by poor performance or redundant results, is important for a further improvement of the integration. The best prediction results are obtained using weighted majority voting, aided by the mRMR (Minimum Redundancy Maximum Relevance) (Peng, 2005) method for the gene finder selection. The prediction accuracies are 84.16% and 90.06% for the basic dataset and testing dataset respectively, which are better than any individual gene finding software in our research.
منابع مشابه
Discovery of transcription factor binding sites through integration of generic motif finders
Locating transcription factor binding sites is a key step in understanding gene regulation. Due to its importance, many de novo motif finding methods (e.g. MEME, MotifSampler, Mitra and Weeder) have been proposed. Individually, these motif finders perform unimpressively overall based on Tompa’s benchmark datasets. Moreover, these motif finders vary in their definitions of what constitute a moti...
متن کاملDIGIT: A Novel Gene Finding Program by Combining Gene-Finders
We have developed a general purpose algorithm which finds genes by combining plural existing gene-finders. The algorithm has been implemented into a novel gene-finder named DIGIT. An outline of the algorithm is as follows. First, existing gene-finders are applied to an uncharacterized genomic sequence (input sequence). Next, DIGIT produces all possible exons from the results of gene-finders, an...
متن کاملEvaluating High-Throughput Ab Initio Gene Finders to Discover Proteins Encoded in Eukaryotic Pathogen Genomes Missed by Laboratory Techniques
Next generation sequencing technology is advancing genome sequencing at an unprecedented level. By unravelling the code within a pathogen's genome, every possible protein (prior to post-translational modifications) can theoretically be discovered, irrespective of life cycle stages and environmental stimuli. Now more than ever there is a great need for high-throughput ab initio gene finding. Ab ...
متن کاملGenomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron-exon structure
MOTIVATION Correct gene predictions are crucial for most analyses of genomes. However, in the absence of transcript data, gene prediction is still challenging. One way to improve gene-finding accuracy in such genomes is to combine the exons predicted by several gene-finders, so that gene-finders that make uncorrelated errors can correct each other. RESULTS We present a method for combining ge...
متن کاملDevelopment of joint application strategies for two microbial gene finders
MOTIVATION As a starting point in annotation of bacterial genomes, gene finding programs are used for the prediction of functional elements in the DNA sequence. Due to the faster pace and increasing number of genome projects currently underway, it is becoming especially important to have performant methods for this task. RESULTS This study describes the development of joint application strate...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010